33 research outputs found

    Menetelmiä luonnollisella kielellä kirjoitettujen raporttien automaattiseen tuottamiseen

    Get PDF
    The use of computer software to automatically produce natural language texts expressing factual content is of interest to practitioners of multiple fields, ranging from journalists to researchers to educators. This thesis studies natural language report generation from structured data for the purposes of journalism. The topic is approached from three directions. First, we approach the problem from the perspective of analysing what requirements the journalistic domain imposes on the software, and how software might be architectured to account for the requirements. This includes identifying the key domain norms (such as the "objectivity norm") and business requirements (such as system transferability) and mapping them to software requirements. Based on the identified requirements, we then describe how a modular data-to-text approach to natural language generation can be implemented in the specific context of hard news reporting. Second, we investigate how the highly domain-specific natural language generation subtask of document planning - deciding what information is to be included in an automatically produced text, and in what order - might be conducted in a less domain-specific manner. To this end, we describe an approach to operationalizing the complex concept of "newsworthiness" in a manner where a natural language generation system can employ it. We also present a broadly applicable baseline method for structuring the content in a data-to-text setting without explicit domain knowledge. Third, we discuss how bias in text generation systems is perceived by key stakeholders, and whether those perceptions align with the reality of news automation. This discussion includes identifying how automated systems might exhibit bias and how the biases might be - potentially unconsciously - embedded in the systems. As a result, we conclude that common perceptions of automated journalism as fundamentally "unbiased" are unfounded, and that beliefs about "unbiased" automation might have the negative effect of further entrenching pre-existing biases in organizations or society. Together, through these three avenues, the thesis sketches out a way towards more widespread use of news automation in newsrooms, taking into account the various ethical questions associated with the use of such systems.Tämä väitöskirja käsittelee luonnollisen kielen – siis esimerkiksi suomen tai englannin kielen – tuottamista automaattisesti sellaisissa yhteyksissä, joissa kielen asiasisällön oikeellisuus on kriittistä. Tällaisia tietokonejärjestelmiä käytetään esimerkiksi säätiedotteiden, urheilu- ja talousuutisten sekä potilaskuvausten kirjoittamiseen. Väitöskirja lähestyy aihetta kolmesta eri näkökulmasta, keskittyen erityisesti journalismiin. Ensimmäisenä väitöskirjassa tarkastellaan, kuinka journalistinen konteksti vaikuttaa siihen, kuinka luonnollista kieltä tuottava tietokonejärjestelmä tulisi rakentaa. Väitöskirjassa analysoidaan journalismiin liittyviä normeja ja käytäntöjä ja siirretään ne ohjelmistotuotannollisiksi vaatimuksiksi. Vaatimusten pohjalta väitöskirjassa tunnistetaan journalistisiin tarkoituksiin sopiva luonnollisen kielen tuotannon ohjelmistoarkkitehtuuri. Toiseksi väitöskirjassa perehdytään luonnollisen kielen tuotannon yhteen aliongelmaan, tekstinsuunnitteluun. Tekstinsuunnitteluvaiheessa valitaan ne tietoalkiot, jotka tekstiin sisällytetään, ja järjestetään valitut tietoalkiot siten, että ne muodostavat ymmärrettävän tekstin. Tätä työvaihetta on yleisesti pidetty eräänä tekstintuotannon “sovelluskohderiippuvaisimmista” vaiheista. Tämä tarkoittaa sitä, että se pitää ratkaista erikseen jokaiselle eri sovellukselle: vaaliuutisia jäsentävä menetelmä ei välttämättä sovellu talousuutisten jäsentämiseen. Väitöskirjassa analysoidaan journalismissa käytettyä “uutisarvon” käsitettä ja kuvataan siihen perustuva menetelmä tietoalkioiden valinnalle. Lisäksi väitöskirjassa esitellään tietoalkioiden järjestämiseen laaja-alaisesti soveltuva menetelmä. Yhdessä nämä menetelmät yksinkertaistavat uusien tekstintuotantojärjestelmien rakentamista tietyissä konteksteissa. Kolmanneksi väitöskirjassa käsitellään tekstintuotantojärjestelmien vinoumia. Kirjassa kuvataan, kuinka automaattisen tekstintuotannon journalistisen käytön kannalta avainasemassa olevat henkilöt näkevät vinoumien uhkan ja kuinka nämä näkemykset vastaavat automaattisen tekstintuotannon todellisuutta. Tarkemmin kirjassa kuvataan, millaisia vinoumia automaattisen tekstintuotannon järjestelmistä saattaa löytyä ja kuinka vinoumat voivat päätyä järjestelmiin. Tältä osin väitöskirjan päätelmä on, että automaattisen tekstintuotannon järjestelmiä ei tulisi pitää lähtökohtaisesti vähemmän vinoutuneina kuin ihmisiä ja että uskomukset automaattisten menetelmien sisäänrakennetusta “reiluudesta” saattavat johtaa epätoivottuihin vaikutuksiin organisaatioiden ja yhteiskunnan vinoumia vakiinnuttaen. Näiden kolmen näkökulman kautta väitöskirjassa hahmotellaan tietä automaattisten tekstintuotannon järjestelmien laajemmalle käytöllä erityisesti uutishuoneissa eettisesti kestävällä tavalla

    Short pauses while studying considered harmful

    Get PDF
    Peer reviewe

    Pauses and spacing in learning to program

    Get PDF
    Conventional wisdom holds that time is an integral part of the learning process. Spacing out learning over multiple study sessions seems to be better for learning than having a single longer study session. Learners should also take pauses from the learning process to absorb, assimilate, and analyze what they have just learned. At the same time, pausing too often can be harmful for learning. Participants of two subsequent introductory programming courses completed programming tasks in an integrated development environment that saved detailed logs of their actions, including time stamps of all the participants' keypresses in said environment. Using this data with background variables and a self-regulation metric questionnaire, we study how the students space out their work, identify trends in between the kinds of pauses the participants took and the course outcomes, and their connection to background variables. Based on our research, students tend to space out their work, working on multiple days each week. In addition, a high relative amount of pauses of only a few seconds correlated positively with exam scores, while a high relative amount of pauses of a few minutes correlated negatively with exam scores. Student pausing behaviors are poorly explained by traditional self-regulation measures such as the Motivated Strategies for Learning Questionnaire and other background variables.Peer reviewe

    Automated Journalism as a Source of and a Diagnostic Device for Bias in Reporting

    Get PDF
    In this article we consider automated journalism from the perspective of bias in news text. We describe how systems for automated journalism could be biased in terms of both the information content and the lexical choices in the text, and what mechanisms allow human biases to affect automated journalism even if the data the system operates on is considered neutral. Hence, we sketch out three distinct scenarios differentiated by the technical transparency of the systems and the level of cooperation of the system operator, affecting the choice of methods for investigating bias. We identify methods for diagnostics in each of the scenarios and note that one of the scenarios is largely identical to investigating bias in non-automatically produced texts. As a solution to this last scenario, we suggest the construction of a simple news generation system, which could enable a type of analysis-by-proxy. Instead of analyzing the system, to which the access is limited, one would generate an approximation of the system which can be accessed and analyzed freely. If successful, this method could also be applied to analysis of human-written texts. This would make automated journalism not only a target of bias diagnostics, but also a diagnostic device for identifying bias in human-written news.Peer reviewe

    Unboxing news automation : Exploring imagined affordances of automation in news journalism

    Get PDF
    News automation is an emerging field within journalism, with the potential to transform newswork. Increasing access to data, combined with developing technology, will allow further inquiries into automated journalism. Producing news text using NLG (natural language generation) is currently largely undertaken in specific, predictable news domains, such as sports or finance. This interdisciplinary study investigates how elite media representatives from Finland, Europe and the US imagine the affordances of this emerging technology for their organization. Our analysis shows how the affordances of news automation are imagined as providing efficiency, increasing output and aiding in reallocating resources to pursue quality journalism. The affordances are, however, constrained by such factors as access to structured data, the quality of automation and a lack of relevant skills. In its current form, automated text generation is seen as providing only limited benefits to news organizations that are already imagining further possibilities of automation.Peer reviewe

    Comparison of Time Metrics in Programming

    Get PDF
    Research on the indicators of student performance in introductory programming courses has traditionally focused on individual metrics and specific behaviors. These metrics include the amount of time and the quantity of steps such as code compilations, the number of completed assignments, and metrics that one cannot acquire from a programming environment. However, the differences in the predictive powers of different metrics and the cross-metric correlations are unclear, and thus there is no generally preferred metric of choice for examining time on task or effort in programming. In this work, we contribute to the stream of research on student time on task indicators through the analysis of a multi-source dataset that contains information about students' use of a programming environment, their use of the learning material as well as self-reported data on the amount of time that the students invested in the course and per-assignment perceptions on workload, educational value and difficulty. We compare and contrast metrics from the dataset with course performance. Our results indicate that traditionally used metrics from the same data source tend to form clusters that are highly correlated with each other, but correlate poorly with metrics from other data sources. Thus, researchers should utilize multiple data sources to gain a more accurate picture of students' learning.Peer reviewe

    Using and Collecting Fine-Grained Usage Data to Improve Online Learning Materials

    Get PDF
    As educators seek to create better learning materials, knowledge about how students actually use the materials is priceless. The advent of online learning materials has allowed tracking of student movement on levels not previously possible with on-paper materials: server logs can be parsed for details on when students opened certain pages. But such data is extremely coarse and only allows for rudimentary usage analysis. How do students move within the course pages? What do they read in detail and what do they glance over? Traditionally, answering such questions has required complex setups with eye tracking labs. In this paper we investigate how fine-grained data about student movement within an online learning material can be used to improve said material in an informed fashion. Our data is collected by a JavaScript-component that tracks which elements of the online learning material are visible on the student's browser window as they study. The data is collected in situ, and no software needs to be installed on the student's computer. We further investigate how such data can be combined with data from a separate learning environment in which students work on course assignments and if the types of movements made by the students are correlated with student self-regulation metrics or course outcomes. Our results indicate that the use of rather simple and non-invasive tracking of students' movements in course materials allows material creators to quickly see major problem-areas in their materials and to highlight sections that students keep returning to. In addition, when the tracking data is combined with student course assignment data, inferring meaningful assignment-specific areas within the course material becomes possible. Finally, we determine that high-level statistics of user movements are not correlated with course outcomes or certain self-regulation related metrics.Peer reviewe

    Personal Research Assistant for Online Exploration of Historical News

    Get PDF
    Demostration paperWe present a novel environment for exploratory search in large collections of historical newspapers developed as a part of the News- Eye project. In this paper we focus on the intelligent Personal Research Assistant (PRA) component in the environment and the web interface. The PRA is an interactive exploratory engine that combines results of various text analysis tools in an unsupervised fashion to conduct au- tonomous investigations on the data according to users’ needs. The PRA is freely available online together with some datasets of European his- torical newspapers. The methods used by the assistant are of potential benefit to other exploratory search applications.Peer reviewe
    corecore